MSIQ: Joint Modeling of Multiple RNA-seq Samples for Accurate Isoform Quantification
نویسندگان
چکیده
Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a highthroughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structure, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate quantification of RNA isoforms from RNA-seq data is a challenging computational task due to the information loss in sequencing experiments. Recent accumulation of multiple RNA-seq data sets from the same biological condition provides new opportunities to improve the isoform quantification accuracy. However, existing statistical or computational methods for multiple RNA-seq samples either pool the samples into one sample or assign equal weights to the samples in estimating isoform abundance. These methods ignore the possible heterogeneity in the quality and noise levels of different samples, and could have biased and unrobust estimates. In this article, we develop a method named “joint modeling of multiple RNAseq samples for accurate isoform quantification” (MSIQ) for more accurate and robust isoform quantification, by integrating multiple RNA-seq samples under a Bayesian framework. Our method aims to (1) identify the informative group of samples with homogeneous quality and (2) improve isoform quantification accuracy by jointly modeling multiple RNA-seq samples with more weights on the informative group. We show that MSIQ provides a consistent estimator of isoform abundance, and demonstrate the accuracy and effectiveness of MSIQ compared to alternative methods through simulation studies on D. melanogaster genes. We justify MSIQ’s advantages over existing approaches via application studies on real RNA-seq data of human embryonic stem cells and brain tissues. We also perform a comprehensive analysis on how the isoform quantification accuracy would be affected by RNA-seq sample heterogeneity and different experimental protocols.
منابع مشابه
Msiq: Joint Modeling of Multiple Rna-seq Samples for Accurate Isoform Quantification by Wei
Next-generation RNA sequencing (RNA-seq) technology has been widely used to assess full-length RNA isoform abundance in a highthroughput manner. RNA-seq data offer insight into gene expression levels and transcriptome structures, enabling us to better understand the regulation of gene expression and fundamental biological processes. Accurate isoform quantification from RNA-seq data is challengi...
متن کاملStatistical modeling of isoform splicing dynamics from RNA-seq time series data
MOTIVATION Isoform quantification is an important goal of RNA-seq experiments, yet it remains problematic for genes with low expression or several isoforms. These difficulties may in principle be ameliorated by exploiting correlated experimental designs, such as time series or dosage response experiments. Time series RNA-seq experiments, in particular, are becoming increasingly popular, yet the...
متن کاملWemIQ: an accurate and robust isoform quantification method for RNA-seq data
MOTIVATION The deconvolution of isoform expression from RNA-seq remains challenging because of non-uniform read sampling and subtle differences among isoforms. RESULTS We present a weighted-log-likelihood expectation maximization method on isoform quantification (WemIQ). WemIQ integrates an effective bias removal with a weighted expectation maximization (EM) algorithm to distribute reads amon...
متن کاملGene expression WemIQ: an accurate and robust isoform quantification method for RNA-seq data
Motivation: The deconvolution of isoform expression from RNA-seq remains challenging because of non-uniform read sampling and subtle differences among isoforms. Results: We present a weighted-log-likelihood expectation maximization method on isoform quantification (WemIQ). WemIQ integrates an effective bias removal with a weighted expectation maximization (EM) algorithm to distribute reads amon...
متن کاملIntegrative analysis with ChIP-seq advances the limits of transcript quantification from RNA-seq.
RNA-seq is currently the technology of choice for global measurement of transcript abundances in cells. Despite its successes, isoform-level quantification remains difficult because short RNA-seq reads are often compatible with multiple alternatively spliced isoforms. Existing methods rely heavily on uniquely mapping reads, which are not available for numerous isoforms that lack regions of uniq...
متن کامل